A Witness Two-Sample Test

All the utility functions we implemented are in the file testing_utils.py.

The provided implementation builts on two other code-bases:

We provide an implementation to estimate KFDA witnesses. To do so, we extend the FALKON (Rudi et al. (2017), Meanti et al. (2020)) code (https://github.com/FalkonML/falkon) with a method for KFDA - implemented via the new class FdaFalkon. We discuss this in Appendix C of our paper.
We provide benchmark experiments for deep optimized kernels. Therefore we reuse the experiments of Liu et al.(2020) (https://github.com/fengliu90/DK-for-TST) and extend them with the proposed Witness approaches.

Installing

The installation is tested for Python version 3.6.

create virtualenvironment
pip install -r requirements.txt
cd kfda_falkon
pip install -e . (to install Falkon modules)

Reproduce Experimental Results

Navigate to the experiment directory and follow the instructions in the respective Readme.md. UPDATE: The sample size (per distribution) used for the HIGGS experiments, is actually twice the samplesize reported in the paper. For example the reported samplesize 1000 means that 1000 each from P and Q where used each for training and testing. So altogether 2000 examples from each distribution (this corresponds to Liu et al) and overall 4000 examples are handled totally.

Minimal Working Example `mwe.py` - Estimating p-values

from sklearn import datasets, model_selection
import numpy as np
import torch
from scipy.stats import norm
import matplotlib.pyplot as plt

import falkon
from testing_utils import snr_score


X, Y = datasets.make_circles(n_samples=1000, shuffle=False, noise=0.1, factor=.9)
#--- prepare data ---- #
X_train, X_test, Y_train, Y_test = model_selection.train_test_split(
    X, Y, test_size=0.5, shuffle=True)
X_train = torch.from_numpy(X_train).to(dtype=torch.float32)
X_test = torch.from_numpy(X_test).to(dtype=torch.float32)
Y_train = torch.from_numpy(Y_train).to(dtype=torch.float32).reshape(-1, 1)
Y_test = torch.from_numpy(Y_test).to(dtype=torch.float32).reshape(-1, 1)
Y_train[Y_train == 0] = -1
Y_test[Y_test == 0] = -1

# define kernel and regularization
kernel = flk_kernel = falkon.kernels.GaussianKernel(1.)
regularization = 1.
kfda_witness = falkon.FdaFalkon(kernel=kernel, penalty=regularization, M=len(X_train))

# ---- STAGE I - Train KFDA witness -----
kfda_witness.fit(X_train, Y_train)

# ---- STAGE II - Compute p-value -----
snr = snr_score(kfda_witness, X_test, Y_test)
tau = np.sqrt(len(X_test)) * snr
p = 1 - norm.cdf(tau)

print("p value = ", p)

References

F. Liu, W. Xu, J. Lu, G. Zhang, A. Gretton, and D. J. Sutherland. Learning deep kernels for non-parametric two-sample tests. ICML, 2020.
A. Rudi, L. Carratino, and L. Rosasco. Falkon: An optimal large scale kernel method. NeurIPS, 2017.
G. Meanti, L. Carratino, L. Rosasco, and A. Rudi. Kernel methods through the roof: Handling billions of points efficiently. NeurIPS, 2020.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
experiments		experiments
kfda_falkon		kfda_falkon
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
mwe.py		mwe.py
requirements.txt		requirements.txt
testing_utils.py		testing_utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

experiments

experiments

kfda_falkon

kfda_falkon

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

mwe.py

mwe.py

requirements.txt

requirements.txt

testing_utils.py

testing_utils.py

Repository files navigation

A Witness Two-Sample Test

Installing

Reproduce Experimental Results

Minimal Working Example `mwe.py` - Estimating p-values

References

About

Releases

Packages

Languages

License

jmkuebler/wits-test

Folders and files

Latest commit

History

Repository files navigation

A Witness Two-Sample Test

Installing

Reproduce Experimental Results

Minimal Working Example mwe.py - Estimating p-values

References

About

Resources

License

Stars

Watchers

Forks

Languages

Minimal Working Example `mwe.py` - Estimating p-values